Ontology Population using Corpus Statistics
نویسندگان
چکیده
This paper presents a combination of algorithms for automatic ontology building based mainly on lexical cooccurrence statistics. We populate an ontology with hypernymy links, thus we refer more specifically to a taxonomy of lexical units (nouns organized by hypernymy relations) rather than an ontology of formally defined concepts. A set of combined statistical procedures produce fragments of taxonomies from corpora that are later integrated into a unified taxonomy by a central algorithm. Our results show that with an ensemble of different components it is possible to achieve an accuracy only slightly worse than human performance. Finally, as our methods are based on quantitative linguistics, the algorithm we propose is not language specific. The language used for the experiments is, however, Spanish.
منابع مشابه
The GENIA Corpus: an Annotated Research Abstract Corpus in Molecular Biology Domain
With the information overload in genome-related field, there is an infreest need for natural language processing technology to extract information from literature and various attempts of information extraction using NLP has been being made. We are developing the necessary resources including domain ontology and annotated corpus from research abstracts in MEDLINE database (GENIA corpus). We are ...
متن کاملDealing with Large Corpora for Ontology Population
Multilingual ontology population from texts, i.e. addition of new terms in an ontology, requires a suitable parallel or comparable corpus. In this paper, we aim to check whether the corpus selected for our project suits the ontology we want to populate. The corpus for ontology population should not only reflect a specific domain and have a sufficient volume of data, as discussed in (Delpech et ...
متن کاملOntoprima: a Prototype for Automating Ontology Population
Ontology Population supports the process of building ontologies in the complex task of instantiating ontology. Performing this process manually is both expensive and time consuming; this logically leads to attempts of fully or partially automating the process of acquisition and absorption of knowledge in general and the process of Ontology Population in particular. This paper presents OntoPRiMa...
متن کاملPopulating Categories using Constrained Matrix Factorization
Matrix factorization methods are a well-scalable means of discovering generalizable information in noisy training data with many examples and many features. We propose a method to populate a given ontology of categories and seed examples using matrix factorization with constraints, based on a large corpus of noun-phrase/context cooccurrence statistics. While our method performs reasonably well ...
متن کاملRule-based Named Entity Extraction For Ontology Population
Currently, Text analysis techniques such as named entity recognition rely mainly on ontologies which represent the semantics of an application domain. To build such an ontology from specialized texts, this article presents a tool which detects proper names, locations and dates from texts by using manually written linguistic rules. The most challenging task is to extract not only entities but al...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015